Introduction

The standard means of diagnosis for acute myeloid leukemia (AML) is through a bone marrow biopsy, a measure that can be costly and time-consuming, hindering early detection. Abnormalities in peripheral bloodwork such as complete blood counts (CBC) can help clinicians determine if confirmatory biopsy is warranted. The objectives of this study are to evaluate if our machine learning (ML) algorithms can use CBCs to: 1) classify AML versus negative-control patients and 2) generate a list of probability rankings highlighting the two most likely diagnoses from the following options - AML, acute promyelocytic leukemia (APL), myelodysplastic syndrome (MDS), preleukemic (PL), or none.

Methods

A total of 2102 CBC results from two academic centers were retrospectively analyzed. Dataset 1, from the National Center for Cancer Care and Research in Qatar, included a learning and evaluation set of 964 CBCs from control (n=782), AML (n=150), and APL (n=32) patients for algorithmic training. Dataset 2, from the University of Pennsylvania Health System, included 1138 CBCs for external validation from AML (n=513), APL (n=21), MDS (n=376), and PL (n=228) patients. CBCs were analyzed from timepoints: 6-months (91-183 days) prior to date of diagnosis (DOD), 3-months (61-90 days) prior, 2-months (31-60 days) prior, 1-month (8-30 days) prior, and DOD (0-7 days). Two algorithms were used for our model, a support vector machine (SVM) and a histogram-based gradient boosting classifier (HGBC). SVM uses a single best fit boundary to make predictions between classes, whereas HGBC combines multiple smaller decision trees that improve accuracy. Our ML model then produced sensitivities and specificities of AML detection against negative controls. The model also provided a list of the two most likely diagnoses for each patient from AML, APL, MDS, PL, and none. The model was evaluated for its ability to identify cases as “needs follow-up” and controls as “no immediate need for follow-up.” This output was used to calculate sensitivities for CBCs based on their collection timepoints.

Results

During its training period, the HGBC model exhibited 99±1% accuracy in distinguishing AML (n=150) from negative controls (n=219) using DOD CBC data from dataset 1. When externally validated using dataset 2 CBCs <6 months from DOD, the model had an accuracy of 95%. The model also revealed a specificity of 99.3% for negative cases using both datasets 1 and 2. Using CBCs from dataset 2, our model incorrectly identified all APL cases as AML. In 87.7% of cases across all diagnoses using CBCs <6 months from DOD, the models' top 2 likely predictions included the patient's true diagnosis. To understand our models' predictive ability to classify AML, APL, PL, and MDS cases as needing follow up and negative cases as no need for follow up, CBCs across various timepoints were used for training. Our model showed a declining sensitivity with increasing time from DOD among all patients with AML, APL, PL, and MDS. Sensitivities were highest at DOD (80.2%) and progressively lower at 1-month (74.2%), 2-months (59.4%), 3-months (58.1%), and 6-months (41.7%) prior to DOD.

Conclusion

CBC is a powerful tool in detecting acute leukemia in the initial stages of screening before patients undergo more advanced diagnostics. Our model showed high specificity in distinguishing AML cases from negative control cases, thus, potentially providing clinicians with the ability to rule-out crucial diagnoses with a cost-effective, non-invasive, and widely accessible blood test. Furthermore, our results indicate that CBCs taken 1-month prior to DOD are more predictive of identifying AML as compared to CBCs taken several months prior, highlighting a rapid disease course that requires prompt evaluation. Further investigation is needed to assess whether our model can serve as a clinical-decision support tool to help rule-in AML and the need for further testing and to assess its ability to differentiate AML from other hematologic malignancies.

This content is only available as a PDF.
Sign in via your Institution